Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libbeat][New Processor] XML Decode #23678

Merged
merged 25 commits into from
Feb 15, 2021
Merged

Conversation

P1llus
Copy link
Member

@P1llus P1llus commented Jan 26, 2021

What does this PR do?

This PR adds 2 components to Libbeat.

  1. It adds a small XML Unmarshal helper function to the common folder in Libbeat, the reason this is added is so that not only processors can utilize it if/when needed, but also inputs. For example I plan to reuse this in http_endpoint to add XML support there as well. Other beats/inputs that would benefit from this is Winlogbeat.
    The helper function expects a []byte with valid XML object(s), and will return it as a struct reusing the names of the XML object(s).
    It supports lists, arrays, nested fields, object identifiers (for example <book seq="1">).
    If the XML is not valid it will return a proper error message describing why it failed.
    For more information on supported formats, please see the included unit test files.

  2. Second component is a xmldecode processor, this is the first processor I made, so please review it carefully. I have used urldecode, mime_type and the decode_json_fields processor as a guideline for this one.
    For the supported objects and more information on supported formats, please see the included unit test files.

Why is it important?

This adds the possibility to read XML files for file/log input, or decoding XML strings in existing message fields, for example XML strings in existing JSON fields.
Implements an easy helper function for other parts of beats to reuse if they want to Unmarshal XML into a struct.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 26, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 26, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23678 updated

  • Start Time: 2021-02-15T10:04:43.755+0000

  • Duration: 51 min 22 sec

  • Commit: 76f6582

Test stats 🧪

Test Results
Failed 0
Passed 45880
Skipped 4762
Total 50642

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 45880
Skipped 4762
Total 50642

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This can be a nice addition. I have added some questions and suggestions.

libbeat/common/xml_test.go Outdated Show resolved Hide resolved
libbeat/common/xml_test.go Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/docs/xmldecode.asciidoc Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/xmldecode.go Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/xmldecode.go Outdated Show resolved Hide resolved
libbeat/common/xml.go Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/xmldecode.go Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/xmldecode.go Outdated Show resolved Hide resolved
libbeat/processors/xmldecode/xmldecode_test.go Outdated Show resolved Hide resolved
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing, for the rest it LGTM.

libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
@jsoriano jsoriano added the needs_backport PR is waiting to be backported to other branches. label Jan 26, 2021
@andrewkroh andrewkroh self-requested a review January 26, 2021 21:26
@andresrc andresrc added the Team:Elastic-Agent Label for the Agent team label Jan 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@P1llus
Copy link
Member Author

P1llus commented Jan 29, 2021

Started off another build, as some CI's are failing, but did not seem to be because of my changes.

@P1llus
Copy link
Member Author

P1llus commented Jan 29, 2021

jenkins run tests please

@P1llus
Copy link
Member Author

P1llus commented Feb 3, 2021

jenkins run tests

@jsoriano
Copy link
Member

jsoriano commented Feb 4, 2021

/package

@P1llus
Copy link
Member Author

P1llus commented Feb 4, 2021

@jsoriano do you have any idea how we can make these builds pass? Iwe rerun the builds about 6 times this week, last week the kibana snapshot made the builds fail and this week I am still a bit unsure.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice addition. I'd like to take a look at it again after you address the comments.

libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/processors/decode_xml_fields/decode_xml_fields.go Outdated Show resolved Hide resolved
libbeat/common/xml.go Outdated Show resolved Hide resolved
libbeat/common/xml.go Outdated Show resolved Hide resolved
@P1llus
Copy link
Member Author

P1llus commented Feb 8, 2021

Adding in the last changes, though still a few more left before a second review is ready.

@P1llus
Copy link
Member Author

P1llus commented Feb 8, 2021

@andrewkroh
Added in the changes you requested, I also moved Target from single string to pointer again, as there was no easily or feasible way to determine if the value was null or empty string (for adding messages to root).

The libbeat helper was also moved to /libbeat/common/enc/mxj after talking to @urso

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@P1llus
Copy link
Member Author

P1llus commented Feb 9, 2021

jenkins run tests please

P1llus and others added 12 commits February 14, 2021 15:34
$ benchcmp old.txt new.txt
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15691         15686         -0.03%
BenchmarkProcessor_Run/nested_and_array_object-12     39673         39098         -1.45%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            158            +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     376            374            -0.53%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          8597          +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     20310         19798         -2.52%
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%
@P1llus
Copy link
Member Author

P1llus commented Feb 15, 2021

jenkins run tests please

@kuisathaverat
Copy link
Contributor

lint stage failed

[2021-02-15T08:08:06.836Z] + make -C libbeat check

[2021-02-15T08:08:06.836Z] make: Entering directory '/var/lib/jenkins/workspace/Beats_beats_PR-23678/src/github.com/elastic/beats/libbeat'

[2021-02-15T08:08:13.413Z] >> fmt - go-licenser: Checking for missing headers

[2021-02-15T08:08:15.951Z] >> fmt - go-licenser: Adding missing headers

[2021-02-15T08:08:15.951Z] >> fmt - autopep8: Formatting Python code

[2021-02-15T08:08:15.951Z] >> fmt - goimports: Formatting Go code

[2021-02-15T08:08:42.504Z] >> check: Checking source code for common problems

[2021-02-15T08:09:21.252Z] # github.com/elastic/beats/v7/libbeat/common/encoding/xml

[2021-02-15T08:09:21.252Z] common/encoding/xml/decode_test.go:302:1: ExampleXMLToJSON refers to unknown identifier: XMLToJSON

[2021-02-15T08:09:26.523Z] Error: failed running go vet, please fix the issues reported: running "go vet ./..." failed with exit code 2

[2021-02-15T08:09:26.524Z] scripts/Makefile:152: recipe for target 'check' failed

[2021-02-15T08:09:26.524Z] make: *** [check] Error 1

[2021-02-15T08:09:26.524Z] make: Leaving directory '/var/lib/jenkins/workspace/Beats_beats_PR-23678/src/github.com/elastic/beats/libbeat'

@P1llus
Copy link
Member Author

P1llus commented Feb 15, 2021

Thanks @kuisathaverat . Sorry for the build spam, it has had quite some issues with failing test runs the whole week, mostly unrelated however this one is related for sure.

@P1llus
Copy link
Member Author

P1llus commented Feb 15, 2021

The name of the Example test for the XML package was not set according to what go vet expects, changed the name.

@P1llus P1llus merged commit 6839307 into elastic:master Feb 15, 2021
P1llus added a commit to P1llus/beats that referenced this pull request Feb 15, 2021
* stashing before initial commit

* Initial commit

* updating go.sum

* updating it again

* adding feedback from PR comments and removing expandkeys config entry

* Updating changelog

* removing expanded_keys from allowed fields

* adding new changes based on PR comments, a few more changes remains

* moving the xml decoder to its own subpackage based on PR comments

* reverting back to Target being a string pointer, to be able to differentiate between null and empty string

* Updating certain tests to fit the new ignore_failure and ignore_missing options

* Updating unit test to test with missing field

* updating license headers

* adding benchmark test

* benchmark, now also with allocation results

* updating changelog entry

* removing duplicate Changelog entry

* changing changelog entry name to new name

* Simplify error handling and fix race

$ benchcmp old.txt new.txt
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15691         15686         -0.03%
BenchmarkProcessor_Run/nested_and_array_object-12     39673         39098         -1.45%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            158            +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     376            374            -0.53%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          8597          +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     20310         19798         -2.52%

* internal xml to json implementation

* Use internal xml to json decoder

benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%

* changelog fix

* Update docs

* Add godoc example of xml to json

* updating test name to fit Example naming convention

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 6839307)
@P1llus P1llus added v7.12.0 and removed needs_backport PR is waiting to be backported to other branches. labels Feb 15, 2021
P1llus added a commit that referenced this pull request Feb 15, 2021
* stashing before initial commit

* Initial commit

* updating go.sum

* updating it again

* adding feedback from PR comments and removing expandkeys config entry

* Updating changelog

* removing expanded_keys from allowed fields

* adding new changes based on PR comments, a few more changes remains

* moving the xml decoder to its own subpackage based on PR comments

* reverting back to Target being a string pointer, to be able to differentiate between null and empty string

* Updating certain tests to fit the new ignore_failure and ignore_missing options

* Updating unit test to test with missing field

* updating license headers

* adding benchmark test

* benchmark, now also with allocation results

* updating changelog entry

* removing duplicate Changelog entry

* changing changelog entry name to new name

* Simplify error handling and fix race

$ benchcmp old.txt new.txt
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15691         15686         -0.03%
BenchmarkProcessor_Run/nested_and_array_object-12     39673         39098         -1.45%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            158            +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     376            374            -0.53%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          8597          +0.00%
BenchmarkProcessor_Run/nested_and_array_object-12     20310         19798         -2.52%

* internal xml to json implementation

* Use internal xml to json decoder

benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%
benchmark                                             old ns/op     new ns/op     delta
BenchmarkProcessor_Run/single_object-12               15686         8051          -48.67%
BenchmarkProcessor_Run/nested_and_array_object-12     39098         20540         -47.47%

benchmark                                             old allocs     new allocs     delta
BenchmarkProcessor_Run/single_object-12               158            75             -52.53%
BenchmarkProcessor_Run/nested_and_array_object-12     374            184            -50.80%

benchmark                                             old bytes     new bytes     delta
BenchmarkProcessor_Run/single_object-12               8597          3520          -59.06%
BenchmarkProcessor_Run/nested_and_array_object-12     19798         7824          -60.48%

* changelog fix

* Update docs

* Add godoc example of xml to json

* updating test name to fit Example naming convention

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 6839307)
v1v added a commit to v1v/beats that referenced this pull request Feb 16, 2021
…-arm

* upstream/master:
  [Metricbeat][Kubernetes] Extend state_node with more conditions (elastic#23905)
  [CI] googleStorageUploadExt step (elastic#24048)
  Check fields are documented for aws metricsets (elastic#23887)
  Update go-concert to 0.1.0 (elastic#23770)
  [Libbeat][New Processor] XML Decode (elastic#23678)
  Fix: bad substitution of API key (elastic#24036)
  [Filebeat] Add Pensando DFW Module (elastic#21063)
  [Filebeat] Check if processor is supported by ES version (elastic#23763)
  Syslog system tests: be more forgiving (elastic#24021)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement libbeat Team:Automation Label for the Observability productivity team Team:Elastic-Agent Label for the Agent team Team:Integrations Label for the Integrations team v7.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Discussion] Adding support/helpers/processors for XML in libbeat
8 participants